81 research outputs found

    Bioinformatic analyses of mammalian 5'-UTR sequence properties of mRNAs predicts alternative translation initiation sites

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Utilization of alternative initiation sites for protein translation directed by non-AUG codons in mammalian mRNAs is observed with increasing frequency. Alternative initiation sites are utilized for the synthesis of important regulatory proteins that control distinct biological functions. It is, therefore, of high significance to define the parameters that allow accurate bioinformatic prediction of alternative translation initiation sites (aTIS). This study has investigated 5'-UTR regions of mRNAs to define consensus sequence properties and structural features that allow identification of alternative initiation sites for protein translation.</p> <p>Results</p> <p>Bioinformatic evaluation of 5'-UTR sequences of mammalian mRNAs was conducted for classification and identification of alternative translation initiation sites for a group of mRNA sequences that have been experimentally demonstrated to utilize alternative non-AUG initiation sites for protein translation. These are represented by the codons CUG, GUG, UUG, AUA, and ACG for aTIS. The first phase of this bioinformatic analysis implements a classification tree that evaluated 5'-UTRs for unique consensus sequence features near the initiation codon, characteristics of 5'-UTR nucleotide sequences, and secondary structural features in a decision tree that categorizes mRNAs into those with potential aTIS, and those without. The second phase addresses identification of the aTIS codon and its location. Critical parameters of 5'-UTRs were assessed by an Artificial Neural Network (ANN) for identification of the aTIS codon and its location. ANNs have previously been used for the purpose of AUG start site prediction and are applicable in complex. ANN analyses demonstrated that multiple properties were required for predicting aTIS codons; these properties included unique consensus nucleotide sequences at positions -7 and -6 combined with positions -3 and +4, 5'-UTR length, ORF length, predicted secondary structures, free energy features, upstream AUGs, and G/C ratio. Importantly, combined results of the classification tree and the ANN analyses provided highly accurate bioinformatic predictions of alternative translation initiation sites.</p> <p>Conclusion</p> <p>This study has defined the unique properties of 5'-UTR sequences of mRNAs for successful bioinformatic prediction of alternative initiation sites utilized in protein translation. The ability to define aTIS through the described bioinformatic analyses can be of high importance for genomic analyses to provide full predictions of translated mammalian and human gene products required for cellular functions in health and disease.</p

    Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana).

    Get PDF
    Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers

    Dual RNA-Seq analysis of the pine-Fusarium circinatum interaction in resistant (Pinus tecunumanii) and susceptible (Pinus patula) hosts

    Get PDF
    Fusarium circinatum poses a serious threat to many pine species in both commercial and natural pine forests. Knowledge regarding the molecular basis of pine-F. circinatum host-pathogen interactions could assist efforts to produce more resistant planting stock. This study aimed to identify molecular responses underlying resistance against F. circinatum. A dual RNA-seq approach was used to investigate host and pathogen expression in F. circinatum challenged Pinus tecunumanii (resistant) and Pinus patula (susceptible), at three- and seven-days post inoculation. RNA-seq reads were mapped to combined host-pathogen references for both pine species to identify differentially expressed genes (DEGs). F. circinatum genes expressed during infection showed decreased ergosterol biosynthesis in P. tecunumanii relative to P. patula. For P. tecunumanii, enriched gene ontologies and DEGs indicated roles for auxin-, ethylene-, jasmonate- and salicylate-mediated phytohormone signalling. Correspondingly, key phytohormone signaling components were down-regulated in P. patula. Key F. circinatum ergosterol biosynthesis genes were expressed at lower levels during infection of the resistant relative to the susceptible host. This study further suggests that coordination of phytohormone signaling is required for F. circinatum resistance in P. tecunumanii, while a comparatively delayed response and impaired phytohormone signaling contributes to susceptibility in P. patula.The National Research Foundation (NRF) of South Africa Scarce Skills Doctoral Scholarship Programme (Grant ID: 97892), the NRF Bioinformatics and Functional Genomics Programme (Grant IDs: 86936, 97911) and a strategic grant from the Department of Science and Technology (DST) for the Tree Genomics Platform at the University of Pretoria. Further support was provided by Sappi, Mondi, York Timbers and Hans Merensky Foundation though the Forest Molecular Genetics (FMG) Programme with co-funding from the Technology and Human Resources for Industry Programme (THRIP, Grant ID: 96413).http://www.mdpi.com/journal/microorganismsam2020BiochemistryForestry and Agricultural Biotechnology Institute (FABI)GeneticsMicrobiology and Plant Patholog

    Combined de novo and genome guided assembly and annotation of the Pinus patula juvenile shoot transcriptome

    Get PDF
    BACKGROUND : Pines are the most important tree species to the international forestry industry, covering 42 % of the global industrial forest plantation area. One of the most pressing threats to cultivation of some pine species is the pitch canker fungus, Fusarium circinatum, which can have devastating effects in both the field and nursery. Investigation of the Pinus-F. circinatum host-pathogen interaction is crucial for development of effective disease management strategies. As with many non-model organisms, investigation of host-pathogen interactions in pine species is hampered by limited genomic resources. This was partially alleviated through release of the 22 Gbp Pinus taeda v1.01 genome sequence (http://pinegenome.org/pinerefseq/) in 2014. Despite the fact that the fragmented state of the genome may hamper comprehensive transcriptome analysis, it is possible to leverage the inherent redundancy resulting from deep RNA sequencing with Illumina short reads to assemble transcripts in the absence of a completed reference sequence. These data can then be integrated with available genomic data to produce a comprehensive transcriptome resource. The aim of this study was to provide a foundation for gene expression analysis of disease response mechanisms in Pinus patula through transcriptome assembly. RESULTS : Eighteen de novo and two reference based assemblies were produced for P. patula shoot tissue. For this purpose three transcriptome assemblers, Trinity, Velvet/OASES and SOAPdenovo-Trans, were used to maximise diversity and completeness of assembled transcripts. Redundancy in the assembly was reduced using the EvidentialGene pipeline. The resulting 52 Mb P. patula v1.0 shoot transcriptome consists of 52 112 unigenes, 60 % of which could be functionally annotated. CONCLUSIONS : The assembled transcriptome will serve as a major genomic resource for future investigation of P. patula and represents the largest gene catalogue produced to date for this species. Furthermore, this assembly can help detect gene-based genetic markers for P. patula and the comparative assembly workflow could be applied to generate similar resources for other non-model species.Additional file 1: Table S1. EvidentialGene tr2aacds pipeline output summary.Additional file 2: Table S2. Assembly statistics for EvidentialGene tr2aacds pipeline merged assembly compared to average statistics for each assembler.Additional file 3: Table S3. Predicted species distribution for non-pine origin sequences removed from the Pinus patula v1.0 transcriptome.Additional file 4: Figure S1. Molecular function gene ontology distribution for the Pinus patula v1.0 transcriptome.Additional file 5: Table S4. Tribe-MCL gene families and annotations for all 15 species used.Additional file 6: Table S5. Conditional reciprocal best BLAST alignment results between full-length Sanger sequenced Pinus taeda cDNA and representative Pinus patula transcripts for each cDNA.Additional file 7: Figure S2. Summary statistics for alignment of Pinus taeda complete CDS sequences to assembled Pinus patula transcripts. Pita = P. taeda. The x-axis represents the query P. taeda cDNA sequence. The solid y-axis (left) illustrates: cDNA query sequence length (pink circle), P. patula subject sequence length (blue square), conditional reciprocal best BLAST alignment length (gold triangle). The dashed y-axis (right) depicts the: percentage identity between sequences (black line), percentage coverage of the P. taeda cDNA by the corresponding P. patula transcript (green cross) and vice versa (purple plus).Additional file 8: Table S6. EBSeq differential expression analysis results comparing expression between inoculated and mock-inoculated data.Additional file 9: Table S7. Summarized list of differentially expressed genes between inoculated and mock-inoculated data with annotations.Forestry South Africa (for seed funding), the Genomics Research Institute (GRI) and the National Research Foundation’s (NRF) Bioinformatics and Functional Genomics Programme (NBFG, UID:71255) as well as Innovation, Thuthuka and THRIP grants (Grant numbers: 84951, 86936, 87912).http://www.biomedcentral.com/bmcgenomicsam201

    The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.

    Get PDF
    A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms

    YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

    Get PDF
    The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software

    YeATS - a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut [version 2; referees: 3 approved]

    Get PDF
    The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software
    • …
    corecore